label proportion
Robust Label Proportions Learning
Learning from Label Proportions (LLP) is a weakly-supervised paradigm that uses bag-level label proportions to train instance-level classifiers, offering a practical alternative to costly instance-level annotation. However, the weak supervision makes effective training challenging, and existing methods often rely on pseudolabeling, which introduces noise. To address this, we propose RLPL, a twostage framework. In the first stage, we use unsupervised contrastive learning to pretrain the encoder and train an auxiliary classifier with bag-level supervision. In the second stage, we introduce an LLP-OTD mechanism to refine pseudo-labels and split them into high-and low-confidence sets. These sets are then used in LLPMix to train the final classifier. Extensive experiments and ablation studies on multiple benchmarks demonstrate that RLPL achieves comparable state-of-the-art performance and effectively mitigates pseudo-label noise.
Learnability of Linear Thresholds from Label Proportions
We study the problem of properly learning linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the learning is on a collection of bags of feature-vectors with only the proportion of labels available for each bag. First, we provide an algorithm that, given a collection of such bags each of size at most two whose label proportions are consistent with (i.e., the bags are satisfied by) an unknown LTF, efficiently produces an LTF that satisfies at least (2/5)-fraction of the bags. If all the bags are non-monochromatic (i.e., bags of size two with differently labeled feature-vectors) the algorithm satisfies at least (1/2)-fraction of them. For the special case of OR over the d-dimensional boolean vectors, we give an algorithm which computes an LTF achieving an additional โฆ(1/d) in accuracy for the two cases.
beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
Valentina Zantedeschi, Rรฉmi Emonet, Marc Sebban
During the past few years, the machine learning community has paid attention to developing new methods for learning from weakly labeled data. This field covers different settings like semi-supervised learning, learning with label proportions, multi-instance learning, noise-tolerant learning, etc. This paper presents a generic framework to deal with these weakly labeled scenarios. We introduce the ฮฒ-risk as a generalized formulation of the standard empirical risk based on surrogate marginbased loss functions. This risk allows us to express the reliability on the labels and to derive different kinds of learning algorithms. We specifically focus on SVMs and propose a soft margin ฮฒ-SVM algorithm which behaves better that the state of the art.